Automatic Mapping of Task and Data Parallel Programs for Efficient Execution on Multicomputers

نویسنده

  • Jaspal Subhlok
چکیده

For a wide variety of applications, both task and data parallelism must be exploited to achieve the best possible performance on a multicomputer. Recent research has underlined the importance of exploiting task and data parallelism in a single compiler framework, and such a compiler can map a single source program in many different ways onto a parallel machine. There are several complex tradeoffs between task and data parallelism, depending on the characteristics of the program to be executed and the performance parameters of the target parallel machine. This makes it very difficult for a programmer to select a good mapping for a task and data parallel program. In this paper we isolate and examine specific characteristics of executing programs that determine the performance for different mappings on a parallel machine, and present an automatic system to obtain good mappings. The process consists of two steps: First, an instrumented input program is executed a fixed number of times with different mappings, to build an execution model of the program. Next, the model is analyzed to obtain a good final mapping of the program onto the processors of the parallel machine. The current implementation is static, feedback driven, although the approach can be extended to a dynamic system. We demonstrate the system with an example program that is a model for many applications in the domains of signal processing and image processing. This research was sponsored by The Defense Advanced Research Projects Agency, Information Science and Technology Office, under the title “Research on Parallel Computing”, ARPA Order No. 7330, issued by DARPA/CMO under Contract MDA972-90C-0035. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. government.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Loops are the main source of parallelism in scientific programs. Hence, several techniques were developed to detect parallelism in these loops and transform them into parallel forms. In this dissertation, compile time transformations and efficient parallel execution of loops with various type of dependencies are investigated. First, Doacross loops with uniform dependencies are considered for ex...

متن کامل

Automatic Selection of Dynamic Data Partitioning Schemes for Distributed-Memory Multicomputers

For distributed memory multicomputer such as the Intel Paragon, the IBM SP-2, the NCUBE/2, and the Thinking Machines CM-5, the quality of the data partitioning for a given application is crucial to obtaining high performance. This task has traditionally been the user's responsibility, but in recent years much effort has been directed to automating the selection of data partitioning schemes. Sev...

متن کامل

Optimizations Used in the Paradigmcompiler for Distributed - Memory Multicomputers

| The PARADIGM (PARAllelizing compiler for DIstributed-memory General-purpose Multicomputers) project at the University of Illinois provides a fully automated means to parallelize programs, written in a serial programming model, for execution on distributed-memory multicomputers. To provide eecient execution, PARADIGM automatically performs various optimizations to reduce the overhead and idle ...

متن کامل

A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

Distributed Memory Multicomputers (DMMs), such as the IBM SP-2, the Intel Paragon, and the Thinking Machines CM-5, offer significant advantages over shared memory multiprocessors in terms of cost and scalability. Unfortunately, the utilization of all the available computational power in these machines involves a tremendous programming effort on the part of users, which creates a need for sophis...

متن کامل

Trasgo 2.0: Code generation for parallel distributed- and shared-memory hierarchical systems

1 Extended Abstract Current multicomputers are typically built as interconnected clusters of shared-memory multicore computers. A common programming approach for these clusters is to simply use a message-passing paradigm, launching as many processes as cores available. Nevertheless, to better exploit the scalability of these clusters and highly-parallel multicore systems, it is needed to effici...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993